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Abstract 

When the study variable is functional and storage capacities are limited 
or transmission costs are high, selecting with survey sampling techniques a 
small fraction of the observations is an interesting alternative to signal com- 
pression techniques, particularly when the goal is the estimation of simple 
quantities such as means or totals. We extend, in this functional framework, 
model-assisted estimators with linear regression models that can take ac- 
count of auxiliary variables whose totals over the population are known. 
We first show, under weak hypotheses on the sampling design and the reg- 
ularity of the trajectories, that the estimator of the mean function as well 
as its variance estimator are uniformly consistent. Then, under additional 
assumptions, we prove a functional central limit theorem and we assess rig- 
orously a fast technique based on simulations of Gaussian processes which 
is employed to build asymptotic confidence bands. The accuracy of the vari- 
ance function estimator is evaluated on a real dataset of sampled electricity 
consumption curves measured every half an hour over a period of one week. 
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1. Introduction 

Survey sampling techniques, which consist in randomly selecting only a part of 
the elements of a population, are interesting alternatives to signal compression 
when one has to deal with very large populations of quantities that evolve along 
time. With the development of automatic sensors such very large datasets of 
temporal data are not unusual anymore and survey sampling techniques offer 
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a good trade-off between accuracy of the estimators and size of the analyzed 
data. Examples can be found in different domains such as internet traffic moni- 
toring (see Callado et al. ( 2009[ )) or estimation of energy consumption measured 
by individual smart meters. Motivated by the estimation of mean consump- 
tion electricity profiles measured every half an hour over one week, |Cardot and 



Josserand (20111 have introduced Horvitz- Thompson estimators of the mean 



function and have shown, under weak hypotheses on the regularity of the func- 
tional trajectories and the sampling design, that one gets uniformly convergent 
estimators. They also prove a functional central limit theorem, in the space of 
continuous functions, that can, in part, justify the construction of asymptotic 
confidence bands. More recently, Cardot et al. (2012b I made a comparison, in 
terms of precision of the mean estimators of electricity load curves and width of 
the confidence bands, of different sampling approaches that can take auxiliary 
information into account. One of the conclusions of this empirical study was 
that very simple strategies based on simple sampling designs (such as simple 
random sampling without replacement) could be improved much if some well 
chosen auxiliary information, whose total is known for the whole population, 
is also taken into account at the estimation stage, with model-assisted estima- 
tors. Important variables for the electricity consumption such as temperature 
or geographical location were not available for these datasets so that only one 
auxiliary information, the mean past consumption over the previous period, was 
taken into account. Its correlation with the current consumption is always very 
high (see Figure [l| so that linear regression models are natural candidates for 
assisting the Horvitz-Thompson estimator. More generally, one advantage of 
linear approaches is that they only require the knowledge of the auxiliary vari- 
able totals in the population. More sophisticated nonlinear or nonparametric 
approaches would have required to know the values of the auxiliary variables 
for all the elements of the population. 

Thus, we focus in this paper on linear relationships between the set of aux- 
iliary variables and the response at each instant t of the current period. The 
regression coefficients vary in time (see Faraway (19971 or Ramsay and Sil- 
( |2005 1) so that the model-assisted estimator can be seen as a direct 



verman 



extension, to a functional or varying-time context, of the generalized regression 



(GREG) estimators studied in Robinson and Sarndal ( 1983 1 and Sarndal et al. 



( 1992 1. Note also that from another point of view, the model-assisted estimator 
can be obtained using a calibration technique (Deville and Sarndal ( 1992[ )). 
Confidence bands are then built using a simulation technique developed in 



(2006 


1 and 


Degras 


(2011 



covariance function of the mean estimator and then, assuming asymptotic nor- 
mality, perform simulations of a centered Gaussian process whose covariance 
function is the covariance function estimated at the previous step. We can, this 
way, obtain an approximation to the law of the "sup" and deduce confidence 
bands for the mean trajectory. In a recent work, Cardot et al. (2012a I have given 
a rigorous mathematical justification of this technique for sampled functional 
data and Horvitz-Thompson estimators for the mean. The required theoretical 
ingredients that can justify such a procedure are the functional central limit 
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Figure 1. Correlation between the current consumption at each instant t of the week under 
study and the total past consumption of the week before. 
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theorem for the mean estimator, in the space of continuous functions equipped 
with the sup-norm, as well as a uniformly consistent estimator of the variance 
function. 

The aim of this paper is to study the asymptotic properties of model-assisted 
estimators and to show that we obtain, under classical assumptions, a uniformly 
consistent estimator of the mean as well as of its variance function. One addi- 
tional difficulty is that, for model-assisted estimators, the variance function can- 
not be derived exactly and we can only have asymptotic approximations. Then, 
we deduce that the confidence bands built via simulations have asymptotically 
the desired coverage. In Section 2, we introduce notations and we suggest a 
slight modification of the model-assisted estimators which permits control of 
the variance of the regression coefficient estimator. Under classical assumptions 
on the sampling design and on the regularity of the trajectories, we state, in 
Section 3, the uniform convergence of the model assisted-estimators to the mean 
function. Under additional assumptions on the design we also prove that we can 
get a consistent estimator of the covariance function and a functional central 
limit theorem that can justify rigorously that the confidence bands built with 
the procedure based on Gaussian process simulations attain asymptotically the 
desired level of confidence. In Section 4, we assess the precision of the variance 
estimator on the real dataset consisting of electricity consumption curves stud- 
ied in Cardot et al. (2012b I and observe that, in our context, the approximation 
error is negligible compared to the sampling error. A brief discussion about pos- 
sible extensions and future investigation is proposed in Section 5. All the proofs 
are gathered in an Appendix. 



2. Notations and estimators 

2. 1 . The Horvitz Thompson estimator for functional data 

Let us consider a finite population Un = {1, of size N supposed to be 

known, and suppose that, for each unit k of the population Un, we can observe a 
deterministic curve Y k = (i / / £ (i))te[o,T]- The target is the mean trajectory /fzv(i), 
t £ [0,T], defined as follows: 

M*) = ^£n(*). ( j ) 

keu 

We consider a sample s, with size n, drawn from Un according to a fixed-size 
sampling design pn{s), where Pn(s) is the probability of drawing the sample 
s. For simplicity of notations, the subscript N is omitted when there is no 
ambiguity. We suppose that the first and second order inclusion probabilities 
satisfy n k = P(fc G s) > 0, for all k G U, and ir kl = V(kM G s) > for all 
k,l G Un, k 7^ I. Without auxiliary information, the population mean curve 
/j,(t) is often estimated by the Horvitz-Thompson estimator, defined as follows 
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for t E [0,7], 

where 1^ is the sample membership indicator, Ik = 1 if k G s and 1^=0 
otherwise. For each t € [0, T], the estimator Jl(t) is design-unbiased for fi(t), i.e. 
E p (j2(t)) = /J,(t), where E p [.] denotes expectation with respect to the sampling 
design. 

The Horvitz-Thompson covariance function of /I between two instants r and 
t, computed with respect to the sampling design, is defined as follows 

Cov p (Mr)M)) = ^T,T,(^-^—-— r,t€[0,T\. (3) 

keuieu 

Note that for r = i, we obtain the Horvitz-Thompson variance function. 



2.2. The mean curve estimator assisted by afunctional linear model 

Let us suppose now that for each unit k € U we can also observe p real variables, 
X\, ...,X p , and let us denote by x^ = (xki, ...,Xkp)' , the value of the auxiliary 
variable vector for each unit k in the population. We introduce an estimator 
based on a linear regression model that can use these variables in order to 



improve the accuracy of ju. By analogy to the real case (see e.g. Sarndal et al. 
(1992)) we suppose that the relationship between the functional variable of 



interest and the auxiliary variables is modeled by the superpopulation model £ 
defined as follows: 

£: Y k (t)=x' k (3(t) + e kt , t € [0,T] (4) 

where (3(t) = (/3i(t), . . . , p (t))' is the vector of functional regression coeffi- 
cients, efct are independent (across units) and centered continuous time pro- 
cesses, Ef (ekt) — 0, with covariance function Cov^e^t, £fe r ) = T(t, r), for (t, r) £ 
[0, T] x [0, T]. This model is a direct extension to several variables of the func- 
tional linear model proposed by Faraway ( 1997| ). 



If Xfc and Yfc are known for all units k £ U and if the matrix G = -h X^fcec/ XfcX fc 
is invertible, it is possible, under the model £, to estimate f3(t) by (3(t) = 
G -1 -^ ^ fcgC/ XfeYfc(t), the ordinary least squares estimator. Then, the mean 



curve u(t) can be estimated by the generalized difference estimator (see Sarndal 
et al.| ( |1992[ ), Chapter 6) defined as follows for all t G [0,T], 



keu kes k 



jy /.^ N ^ 

keU kes 
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where Y k (t) = x' k /3(t). 

In practice, we do not know except for k £ s, and it is not possible to 
compute (3(t). An estimator of fi(t) is obtained by substituting each total in 

(3(t) by its Horvitz-Thompson estimator. Thus, if the matrix G = Xfces X £* fc 

is invertible, f3(t) is estimated by: 

iV z — ' 7Tj, 

Remark that the denominator N is used in the expression of f3(t) for asymptotic 
purposes and need not be estimated. The model-assisted estimator JiMA{t) is 
then defined by replacing /3(t) by (3(t) in 

MMA(t) - ~ E - ^ E r * (t) ~ — » te M' 

fcG(7 fees 71-/8 

where F fc (t) = x' k f3(t). Since X^fcGtr^W — (X/fcec/ Xfe ) /^Wj the only required 
information to build ftMA{t) is x^ and Yk(t) for all the units k £ s as well as 
the population totals of the auxiliary variables, X^gc/ Xfc ' 

Remark 1. If the vector of auxiliary information contains the intercept (con- 
stant term), then it can be shown (see Sarndal ^1980 )) that the Horvitz-Thomps 



on 



estimator of the estimated residuals Yk(t) — Yfc(i) is equal to zero for each 
t £ [0, T]. This means that the model- assisted estimator flu A reduces in this 
case to the mean in the population of the predicted values 



keu 

Moreover, if only the intercept term is used, namely Y k (t) = j3{t) + Skt for all 
k £ U, then the estimator flu A is simply the well known Hdjek estimator, 

mmaw — — ^ n — ' *s[u,jj, 



which is sometimes preferred to the Horvitz-Thompson estimator (see e.g. Sam 



dal et al. (1992), Chapter 5.7) 



Remark 2. Estimator jiMA{t) may also be obtained by using a calibration 
approach (Deville and Sarndal \1992 )) which consists in looking for weights 



Wksi k £ s, that are as close as possible, according to some distance, to the sam- 
pling weights l/wk while estimating exactly the population totals of the auxiliary 
information, 



E^fesXfc = E x fc- 
fees keu 
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Considering the chi-square distance leads to the following choice of weights 

nk Kit.** Ttu ) Kits ni J nk 

and the calibration estimator ^ s Wk s Yk(t)/N for the mean fi(t) is equal to 
Mma(^) defined in 

2. 3. A regularized estimator for asymptotics 

The construction of the estimator /5ma(*) is based on the assumption that 
the matrix G is invertible. To show the uniform convergence, we consider a 
modification of G which will permit control of the expected norm of its inverse. 



Bosq 


( 


2000 


) and 


Guillas 



(2001). Since G is a p x p symmetric and non negative matrix it is possible to 



write it as follows 

p 

G = Vj.nVjnVjn) 
3=1 

where T)j in is the jth eigenvalue, rjx,n > • ■ ■ > Vp,n > 0, and v 3 - n is the corre- 
sponding orthonormal eigenvector. Let us consider a real number a > and 
define the following regularized estimator of G, 

p 

G a = max(r?j in , o) v jn v' jn . 

It is clear that G a is always invertible and 

HG- 1 !! < a" 1 , (7) 

where ||.|| is the spectral norm for matrices. Furthermore, if T]p^ n > a then 
G = G a . If a > is small enough, we show under standard conditions on the 
moments of the variables X\ , . . . , X p and on the first and second order inclusion 
probabilities that P(G ^ G a ) = P(ry p , n < a) = 0{n~ r ) (see Lemma 
Appendix) . 

Consequently, it is possible to estimate the mean function /xjv(i) by the fol- 
lowing estimator 

UmaA*) = ± £ n, a {t) - ~ E YkAt) ; Yk{t) , te[0,T\, (8) 
keu kes nk 

where %,„(<) = x' fc 3 Q (t) and 3„(t) = G" 1 ^ £ fe£s ^f 1 - 



A.l 



the 
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2-4- Discretized observations 

Note finally that with real data, we do not observe Yk(t) at all instants t in [0, T] 
but only for a finite set of D measurement times, = t\ < ... < try = T. In 
functional data analysis, when the noise level is low and the grid of discretization 
points is fine, it is usual to perform a linear interpolation or to smooth the 
discretized trajectories in order to obtain approximations of the trajectories at 



every instant t S [0, T] (see Ramsay and Silverman (2005)) 



If there are no measurement errors, if the trajectories are regular enough 
(but not necessarily differentiable) and if the grid of discretization points is 



dense enough, Cardot and Josserand (20111 showed that linear interpolation 



can provide sufficiently accurate approximations to the trajectories so that the 
approximation error can be neglected compared to the sampling error for the 
Horvitz-Thompson estimator. Note also that even if the observations are cor- 



rupted by noise, it has been shown by simulations in Cardot et al. (2012a) that 
smoothing does really improve the accuracy of the Horvitz-Thompson estimator 
only when the noise level is high. 

Thus, for each unit k in the sample s, we build the interpolated trajectory 

Y k . d {t) = Y k (u) + Yk{t \ +l) Yk{U) (t u) te[U,t i+1 ] 

H+l — H 

and we define (3 a d (t) as the estimator of (3(t) based on the discretized observa- 
tions as follows 



N ■ 
kes 

= 3 a (*i) + 3a(f ; +l) ~f- (t<) (*-to- 

Therefore, the estimator of the mean population curve fj,(t) based on the 
discretized observations is obtained by linear interpolation between /XMA,a(*i) 
and MMA,o(*i+i)- For t £ [U,t t+ i], 



1 ^ - *M(0) 



keu kes 

I , s , MMA,o(*i+l) - V>MA,a{U) ,. , s , n s 

= HMA,a(ti) H " (t — ti) ( K J) 

H+l ~ H 



where Y k<d (t) = x' fc /3 a Jt). 



3. Asymptotic properties under the sampling design 



All the proofs are postponed in an Appendix. 
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3.1. A ssumptions 

To derive the asymptotic properties under the sampling design p(-) of ^MAd we 
must suppose that both the sample size and the population size become large. 
More precisely, we consider the superpopulation framework introduced by Isaki 
and Fuller (1982) with a sequence of growing and nested populations Un with 



size iV tending to infinity and a sequence of samples of size njv drawn 
from Un according to the sampling design Pat(sjv)- The first and second order 
inclusion propabilities are respectively denoted by tt^n and kmn- For simplicity 
of notations and when there is no ambiguity we drop the subscript N. To prove 
our asymptotic results we need the following assumptions. 

71 

Al. We assume that lim — = u e (0, 1). 

AT->oo N 

A2. We assume that mfriTrfc > A > 0, minn^i > A* > and 

keU k^l 

lim sup n max \iTki — tt^i] < C\ < oo 

N->oo k^l£U 

A3. There are two positive constants Ci and C3 and 1 > /? > 1/2 such that, 
for all N and for all (r, t) e [0, T] x [0, T], 



|2fJ 



lj> fc (0) 2 <C 2 and lj2{ Y ^)-Mr)} 2 <C 3 \t-r\ 
keu keu 

A4. We assume that there is a positive constant C4 such that for all k £ U, 

K|| 2 <c 4 . 

A5. We assume that, for N > Nq, the matrix G is invertible and that the 



Assumptions Al and A2 are classical hypotheses in survey sampling and 
deal with the first and second order inclusion probabilities. They are satisfied 



for many usual sampling designs with fixed size (see for example Hajek (19811 



Robinson and Sarndal (19831 and Breidt and Opsomer (2000)). 



Assumption A3 is a minimal regularity condition already required in Cardot 



and Josserand (20111. Even if pointwise consistency, for each fixed value of t, 
can be proved without any condition on the Holder coefficient /3, this regularity 
condition is necessary to get a uniform convergence result. A counterexample 
is given in Hahn (19771 when /3 < 1/2. More precisely it is shown that the 



sample mean i.i.d copies of a uniformly bounded continuous random function 
defined on a compact interval may not satisfy the Central Limit Theorem in 
the space of continuous functions. The hypothesis /3 > 1/2 also implies that the 
trajectories of the residual processes Ckt, see Q, are regular enough (but not 
necessarily differentiable) . Assumption A4 could certainly be weakened at the 
expense of longer proofs. Assumption A5 means that for all uel, with u/0, 
we have u'Gu > au'u. The same kind of assumption is required in |Isaki and 



Fuller ( 1982 ) to get the pointwise convergence in probability whereas Robinson 
and Sarndal (19831) introduce a much stronger condition (condition A7 in their 
article) which directly deals with the mean square convergence of the estimator 
of the vector (3 of regression coefficients. 
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3. 2. Uniform consistency of fiMA,d 

We aim at showing that fiMA.d is uniformly consistent for fi, namely that, for 
all e > 0, 



P sup |/2MA,d(t)-Mi)| >e\->0, 
\*e[o,T] J 

when N tends to infinity. The suitable space for proving the uniform convergence 
is the space of continuous functions on [0, T] , denoted by C[0, T], equipped with 
its natural distance p; for two elements f,g <E C[0, T], the distance between / 
and g is p(f,g) = Sup t£ m T i \f(t) — g(t)\. It results that the uniform consistency 
of JlMA,d is simply the convergence in probability of p,MA,d to fi in the space 
C[0, T]. Remark that with assumption A3 the trajectories are continuous for 
all k e U, and thus the mean curve /i belongs to C[0, T] as well as its estimator 
fi>MA,d, by construction. 

We first state the uniform consistency of the estimator (3 ad (t) towards its 
population counterpart (3(t) under conditions on the number and the repartition 
of discretization points. 

Proposition 3.1. Let assumptions (A1)-(A5) hold. If the discretization scheme 
satisfies maXj S { lj ...d n -i} ~ ti\ 213 — o(n _1 ) then there is a constant C > 
such that, for all n, 



fn E p < sup 
I te[o,T] 



< c. 



We can now state a similar type of result for the estimator of the mean function. 

Proposition 3.2. Let assumptions (A1)-(A5) hold. If the discretization scheme 
satisfies maXjg.M,. ) £) JV ._u \ti+i — ti\ 213 = o(n _1 ) then there is a constant C > 
such that, for all n, 

y/n E p < sup | p,MA,d(t) - fi(t) | > < C. 
\te[o,T] I 



We deduce from Proposition 3.2 that estimator pMA.d{t) is asymptotically 
unbiased as well as design consistent. Note that the approximation error (with 
linear interpolation) is negligible, compared to the sampling variability, under 
the additional assumption on the repartition of the discretization points. This 
assumption also tolds us that less discretization points are required for smoother 
trajectories. 

Let us also remark that, for each t, 

$MA,a(t) - m = jr £ (l " ^) 4 ()§„(*) - 3(t)) , (10) 
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where l k is the sample membership, so that it is not difficult to prove, under 
previous assumptions and by using lemma |A.4| in the Appendix, that for all 

te[o,T], 

Vn(fiMA,d(t)-Jt{t)) = o p (l). (11) 

3.3. Covariance function estimation under the sampling design 

We undertake in this section a detailed study of the covariance function of 
estimator fiMA,d- The covariance function is computed with respect to the sam- 
pling design p(-) and from relation ([9]), we can deduce that 'pMA,d is a nonlinear 
function of Horvitz-Thompson estimators, so the usual Horvitz-Thompson co- 
variance formula given by (J3j) can not be used anymore. Nevertheless, in light 



of relation (111, the covariance function of £iMA,d between two instants r and 
t may be approximated by the covariance Cov p (/S(r), p*(t)), which in turn is 
equal to the Horvitz-Thompson covariance applied to the residuals Y k — Y k . Let 
us denote by jma the approximative covariance function of pMAd defined as 
follows 

Y k (r)-Y k (r) Y k (t)-Y k (t)\ 



7MA(r, t) = — Cov p ^ , ^ 



1 v-v-, ^ Y k (r)-Y k (r)Y(t)-Y(t) 
= jpl^l^^i-^i) , r,ie[0,T]. 

keuieu 

(12) 

This approximation explains that model-assisted estimators will perform much 
better than Horvitz-Thompson estimators if the residuals Y k (t) — Y k (t) are 
small compared to Y k (t). The covariance function 7ma(^, t) can be estimated by 
the Horvitz-Thompson variance estimator for the estimated residuals Y k ^{t) — 
Y k , d (t), 



7MA,d(f, t) = — > , r,te 0,T, 

(13) 



N 2 " TTfei TT/c 7T; 



where F M (i) = x' fc /3 M (t). 

To prove the consistency of the covariance estimator 7MA,d(?~, t), let us introduce 
additional assumptions that involve higher-order inclusion probabilities as well 
as conditions on the fourth order moments of the trajectories. 

A6. We assume that 

lim max |E p {(l fc; - 7r H )(l fc /p - Tr k 'i')}\ = 

where D t: N is the set of all distinct i-tuples (ii, it) from Un and Ijy = 

lfcl Z . 
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A7. There are two positive constants C5 and C§ such that N 1 J^u ^fc(O) 4 < 
C 5 and N-^iYkit) - F fe (r)} 4 < C 6 \t-r\^, for all (r,t) € [0,T] 2 



Condition A6 has already been assumed by Breidt and Opsomer (2000) in a 



nonparametric model-assisted context and in Cardot and Josserand (20111 for 
Horvitz-Thompson estimators. It can be checked that it is fulfilled for simple 
random sampling without replacement (SRSWOR) or stratified sampling with 
SRSWOR within each strata. More generally, it is fulfilled for high entropy 
sampling designs, 
sampling whereas 1 



Boistard et al. 


( 


2012 


Cardot et al. (2012c) 



such as Sampford sampling, whose Kullback-Leibler divergence with respect to 
rejective sampling, tends to zero when the population size increases. 

Proposition 3.3. Assume (Al)-(Al) hold and the sequence of discretization 
schemes satisfy limjv_ > . 00 max ig { 1 \U + i — tj| = 0. Then, as N tends to 

infinity, we have for all (r, t) £ [0,T] 2 , 



n E p {\7MA,d(r, t) -jMA(r,t)\} 







and 



nE p < sup \jMA,d(t,t) - 7ma(*,*)| 
I te[o,T] 



Since njMA{r, t) remains bounded, the previous proposition tells us that 
7MA,rf is consistent pointwise and the variance function estimator is uniformly 
convergent. Note also that the condition on the number of discretization points 
is much weaker than in Proposition |3.2| because we do not give here rates of 
convergence. To obtain such rates, we would also need to impose additional 
assumptions on the sampling design. 



3. 4- Asymptotic normality and confidence bands 



We assume a supplementary assumption in order to get the asymptotic normal- 
ity of the functional estimator fj,MA,d in the space of continuous functions. 

A8. We assume that for each fixed value of t g [0, 1], 

{7MA(i, t)}- 1 ' 2 (p{t) - n(t)) -> Af(0, 1) 



in distribution when N tends to infinity. 



This assumption is satisfied for usual sampling designs (see e.g. Fuller (20091, 
Chapter 2.2). Note that using relation (111, we can write for any fixed value 

te[o,T], 

P-MA.d(t) - n{t) = jS(t) - fi(t) + o p (n^ 1/2 ), 
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and deduce that y/n (pMA.d{t) ~ ^ s a l so pointwise asymptotically Gaussian 
when conditions of Proposition |3 . 1 | hold. Let us state now a much stronger result 
which indicates that the convergence to a Gaussian distribution also occurs for 



the trajectories, in the space of continuous functions (see Billingsley (19681, 
Chapter 2). 

Proposition 3.4. Assume (A1)-(A5) and (A8) hold. If the discretization scheme 
satisfies max i= {! ,..,d n -i} \ti+l~U\ 2 — o{n~ l ), we have whenn tends to infinity 



MA,d - 



where indicates the convergence in distribution in C[0, T] with the uniform 
topology and Z is a Gaussian process taking values in C[0, T] with mean and 
covariance function 7z(r, t) = lim„_i. +00 rvyMAir, t). 

The "sup" functional defined on the space of continuous functions being con- 
tinuous, the Proposition 3.4 implies that the real random variable sup t | \fn {Jj.MA,d(t) 



M(*)} 



converges in distribution to sup t |Z(t)|. We thus consider confidence bands for 
fi of the form 



MMA,d(i) ± c 



a(t) 



,te[o,T] 



(14) 



where c is a suitable number and a(t) = -y/WyMA.d^! t)- Note that the fact that 
/x belongs to the confidence band defined in ( 14 1 is equivalent to 



Sup — — \HMA,d{t) 

te[o,T] °W 



/*(*)! < 



Given a confidence level 1 — a G (0,1), one way to build such confidence 
band, that is to say one way to find an adequate value for c a , is to perform 
simulations of a centered Gaussian functions Z defined on [0, T] with mean 

and covariance function n7MA,d(f; i) and then compute the quantile of order 

1 — a of supjgjQ j.] Z(t)/a(t) . In other words, we look for a constant c a , which is 
in fact a random variable since it depends on the estimated covariance function 
7MA,d, such that 



Z(t)\ < 



ait) 



, Vt E [0, T] | 7MA ,d = 1 



The asymptotic coverage of this simulation based procedure has been rigor- 
ously studied for the Horvitz-Thompson estimators of the mean of sampled and 



noisy trajectories in Cardot et al. (2012a I whereas Cardot et al. (2012b I have 



successfully employed this approach on real load curves with model-assisted es- 
timators. The next proposition, which can be seen as a functional version of 
Slutsky's Lemma, provides a rigorous justification of this latter technique. 

Proposition 3.5. Assume (A1)-(A8) hold and the discretization scheme satis- 
fies max !£{lr . iDjv _ 1} \t i+ x - U\ 2p = o(n" 1 ). 
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Let Z be a Gaus sian process with mean zero and covariance function 7^ ( as 
in Proposition 3.4-). Let (Zfj) be a sequence of pro cesses such that for each N, 



conditionally on the estimator 7 mam defined in (13), Z^ is Gaussian with mean 
zero and covariance rijMA,d- Suppose that ~fz(t,t) is a continuous function and 
mi t "fz(t,t) > 0. Then, as N — > oo, the following convergence holds uniformly 
in c, 

P(\Z N (t)\<ca(t), Vie [0,T] \j M A,d) -^V(\Z(t)\<ca(t), Vt G [0, T]) , 



where d(t) = y/nrfMA,d(t,t) and a ( t ) = v7z(M)- 

As in Cardot et al. ( |2012a| , it is possible to deduce from the previous propo- 
sition that the chosen value c Q = c a (jMA.d) provides asymptotically the desired 
coverage since it satisfies 



lim 

A-S-OC 



(l(t) G 



MMA.d 



(t)±C Q 



d(t) 



V< G [0,T] 



a. 



4. An illustration on electricity consumption curves 



We consider, as in Cardot et al. (2012b I, a population of TV = 15069 electricity 



consumption curves, measured every 30 minutes over a period of one week. Each 
element k of the population is thus a vector with size 336, denoted by (Yfc(t), t G 
{1, . . . , 336}). The auxiliary information X of values Xk,k G U is simply the 
mean consumption of each meter k G U recorded during the week before the 
sample is drawn. As shown in Figure[T] the real variable X is strongly correlated 
with the consumption at each instant t of the current period of estimation so 
that a linear model with a functional response is well adapted for model-assisted 
estimation. 

We draw samples of size n, for i = 1, . . . , I = 10000 with simple random 
sampling without replacement (SRSWOR) so that = n/N for k G {1, . . . , TV}. 
We define, for each sample s,-, the model-assisted estimator of the mean curve, 



1 

iV 



1 

iV 



(i) 



n/N 



(15) 



where xj, = (l,x fe ), fft) = x^ ft), and 3 W (t) = G^E^ 
for i G {1, ...,336}. Cardot et al.| (2012b) noted that, for the same sample 
size, the mean square error of estimation of the mean curve is divided by four 
compared to the Horvitz-Thompson estimator with SRSWOR when considering 



x^3 W W, and 3 W (*) 



the model-assisted estimator defined in (151. There is only one covariate in this 



study and we did not encounter any problem with the invertibility of matrix G, 
the value of parameter a is thus a = 0. 

We also define ~p,{t) = j J2i=i Ama ^ G {1, . . . , 336}. The true variance 
function of the model-assisted estimator being unknown, we approximate it with 
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Figure 2. Empirical variance function j em p, approximated variance ~jMA an d estimated 
variance jMA,d obtained with a sample of size n = 1500. 
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a Monte Carlo approach based on the / = 10000 samples drawn with simple 
random sampling without replacement. The approximation to the true variance 
function is thus given by 



1 1 

7emp(r, t) = j ^(AMA.dM ~ H i MP'MA,d( r ) ~ ^( r )) 
i=l 



(16) 



for (r,t) e {1,...,336}. 

The following quadratic loss criterion which measures a relative error is used 
to evaluate, for each sample, the accuracy of the variance estimator defined in 



(13), 



EM 



MA.d ) 



336 



336 

E 



\lMA,d{t,t) - 7emp(M)|" 

Temp (M) 2 



(17) 



We also decompose, over the / = 10000 estimations, the relative mean square 
error (RMSE) of the estimator into an approximation error (RB^ma^) 2 ) and 
a variance term (VR('jMA,d)) that can be related to the sampling error, 



i 1 

RMSE{*f M A,d) = jJ2 E r ] (lMA,d) 



= RB(jMA,d) 2 + VRtfMA,d) 

where Er (■jMA,d) is the value of E r (jMA.d) f° r the ith sample. The relative 
bias of the estimator jMA.d may be written as 



RB{^MA.d) 2 



n 336 

-Y 



336 



7MA,<i(M) ~ 7emp(M) 

(t,t) 



where 7jvfA,rf(*» *) = 7 Ei=i Tma,/^ *)■ 



Sample size 


fiAfSE(7MA,,i) 




Er(lMA,d) 


15 


925 


Median 


975 


995 


250 


0.1315 


0.0027 


0.0264 


0.0455 


0.0707 


0.117 


0.4945 


500 


0.0697 


0.0016 


0.0166 


0.029 


0.0459 


0.0794 


0.1945 


1500 


0.0238 


0.0003 


0.0076 


0.0125 


0.0186 


0.028 


0.0569 



Table 1 

Summary statistics for -Er(7Mj4,<i, 7emp), with 1=10000 samples. 



The RMSE as well as the approximation error and statistics (quantiles) for 
E r are given in Table [l] We can note that logically the RMSE decreases as the 
sample size increases and that even for moderate sample sizes, the estimations 
are rather precise. A closer look on how the RMSE is decomposed reveals that 
estimation error is mainly due to the sampling error, via the variance term 
whereas the approximation error term RB(jMA,d) 2 is negligible. This fact can 
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be observed in Figure [2] were we plot the true variance function j emp over the 
considered period, its approximation jma as well as an estimation jMA,d with 
a sample with size n = 1500, whose error according to criterion (17 1 is close to 
the mean error (E r w 0.02). 

We have also plotted in Figure [3] the difference between the empirical covari- 
ance function ~f emp and its approximation jma an d in Figure [4] the difference 
between jma and its estimation jMA,d for a sample with size n = 1500 whose 
error, E r w 0.02, is close to the mean value. Once again, it is clearly seen that 
the approximation error to the true covariance function (see Figure [3]) is much 
smaller than the sampling error (see Figure [4]) . We can also remark some strong 
periodic pattern which reflects the natural daily periodicity in the electricity 
consumption behavior and that is related to the temporal correlation of the 
unknown residual process €kt defined in (HJ. 



5. Concluding remarks 



We have made in this paper an asymptotic study of model-assisted estimators, 
with linear regression models with functional response, when the target is the 
mean of functional data with discrete observations in time. This work can be 
extended in many directions. For example, one could consider more sophisticated 
regression models than model Q such as non linear or nonparametric models 
with functional response by adapting, in a survey sampling context, models 



studied in the functional data analysis literature (see Chiou et al. (2004 1, Cardot 



(2007 1, or Ferraty et al. (2011 )). However, one important drawback of such more 



sophisticated approaches is that they would require to know for all the units 
fc in the population as opposed to only their population totals. 

An interesting direction for future investigation would be to consider noisy 
and possibly sparse measurements in time. For the Horvitz-Thompson estima- 



tor, local polynomials are employed in Cardot et al. (2012a I in order to first 



smooth the trajectories and it would certainly be possible to adapt the tech- 
niques developed in this work to the model-assisted estimation procedure. 

Another promising direction for future research would be to adapt model- 
assisted estimators for time-varying samples. When one works with large net- 
works of sensors it can be possible to consider a sequence of samples s(t) that 
evolve along time. A preliminary work (see Degras ( 2012[ )), which focuses on 
Horvitz-Thompson estimators and stratified sampling clearly shows that such 
time- varying samples can outperform sampling designs that are fixed in time. 



Acknowledgements. We thank the two anonymous referees for a careful read- 
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Figure 3. (Approximation error) difference between the covariance function and its approx- 
imation, 7emj>(i;f) — 7ma('i^ f or a sample with size n = 1500. 
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Figure 4. (Sampling error) difference between the approximated covariance function and its 
estimation, ■yMA(t,r) — jMA,d(t,r), for a sample with size n = 1500. 
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Appendix A: Proofs 

Throughout the proofs we use the letter C to denote a generic constant whose 
value may vary from place to place. We also denote by a/j = ^ — 1, k E U and 

by Afc; = TT k l - TT k TTl, k,l eU. 



A.l. Some useful Lemmas 



Note that the result showed in the first Lemma is sometimes stated as an as- 
sumption (see e.g Robinson and Sarndal (1983)). It is used to prove the conver- 
gence of the estimator of the mean in terms of mean square error. 

Lemma A.l. Let assumptions (Al), (A2) and (A4-), (A5) hold. Then, there is 
a constant C such that 



n E 



*(ll 



G 



-1||2 



< c. 



Proof. The proof follows the lines of (JBosq (2000), Theorem 8.4) and ( |Cardot 
et al. (20101, Proposition 3.1). Using assumption (A5) and inequality (|7|, we 
have 

IIG^-G^II^IIG^II.IIG^GII.HG- 1 !! 
<a- 2 ||G a -G||, 



which implies 



E„ 



G 



-1||2 



< a" 4 E r , 



IG„ - G| 



(18) 



To bound E p I ||G a — G]| J , we use the following decomposition. 



E p M|G« - G|n = E p M|G„ - G||^l { g a=e 



E 1; 



<E„ IIG-GK 



2E r 



|G„ 



G a - G|| 2 
G|| 2 1 



L {G a #G} 

< 3E P (||G - G|| 2 ) + 2E P (||G a - G\\ 2 l {& ^ &} 



-2E„ 



|G-G| 
(19) 



To bound the first term from the right-side of (19 1, we use the fact that the 
spectral norm is majored by the trace norm || • || 2 defined by ||A|| 2 = tr(A'A). 
Next, we show (see also Cardot et al. ( 2010[ ), proof of Proposition 3.1) that, 



EpHG-GH^Ofa- 1 ). 



(20) 
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We have, with assumptions (Al), (A2) and (A4) that, 



E P ||G-G||^ y;2 



72 E p ( H afea/tr[xfex^x ; x;] j 
\keuieu ) 



< — s~ llxfex'JIo + max |A fc ;| I „ 



fcef/ keUleu 
1 



|x fc |! 2 ||x ; || 2 



1 / " 1 

< - I v + n max A w | — 
n \N X k^ieu A 2 



< 



On the other hand, 



C 



E 



p (||G a -G|| 2 l {&n#&} ) <a 2 P(G a ^G) 



since 



|Ga-G|| 2 = 



^[max(r7 J> , a) - Vj,n]vjnVj n 



< sup I max(ry J> , a) - | ' 
<a 2 . 



Moreover, since a < 77^ = ||G 1 || 1 and by Chebychev inequality, we can bound 
P(G a ^ G) = P(^„ < a) 



4 



> 



|r/ p - a\ 



< 



< 



(t], p - a) 2 
4 



(Vp - a ) 



E p (|7? p , n - ri p \' 
E„(||G-G|| 2 



2^P 



because it is known that the eigenvalue map is Lipschitzian for symmetric ma- 
trices (see Bhatia ( 1997[ ), Chapter 3). This means that for two pxp symmetric 
matrices A and B, with eigenvalues ?7i(A) > 772(A) > ••• > r] p (A) (resp. 
'Ji(B) > • • • > 77p(B)), we have 

max |tfc(A)-»fe(B)|<||A-B||. 



Hence, for some constant C 

E p (||G Q -G|| 2 ) <3E p (||G -G| 
C 

< . 

n 



2a 2 ¥(G a ? G) 



(21) 
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Combining (18 1, (19), (20) and (21 1, the proof is complete. 



□ 



Lemma A. 2. Under assumptions (Al), (A2) and (A4), there is a constant C 
such that, for all n, 



n E„ 



1 E 



keu v h 



- 1 Xfc 



< C. 



Proof. Expanding the square norm, we have 



N ^ 



keu 



nE P (n2 EE a ^ x fc*; ) 
\ keu leu ) 



< 



n 



N 2 



< 



EV^ ^kl i 

keu leu 

n 1 1 i a i 

— v + T7 n max \ A kl\ 
NX A 2 k^ieu 



1 E 



keu 



and the result follows with hypotheses (Al), (A2) and (A4). 



□ 



Lemma A. 3. Under assumptions (A2)-(A5), we have 

i) \\p(t)-p(r)\\ 2 <a- 2 C 3 C A \t-r\^. 
n) 0Jt)-f3 a (r)\\ 2 <^C 3 C 4 \t-r\^. 

Proof For i), we just need to remark that, under hypotheses (A3), (A4) and 
(A5), 



G- 1 ^Y. x ^ Y k(t)-Y k (r] 



keu 



<IIG-T(^£W 

V keu 
< a- 2 C 4 C 3 \t-r\ 2f3 . 



keu ) 



The proof of point ii) is similar, but also requires the use of lower bounds on 
the first order inclusion probabilities (assumption (A2)), 
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\\P a (t) - [3 a (r)f 



< 



\ k£U / \ keU 



{Y k {t)-Y k {r)f 



a 



The following Lemma states the pointwise mean square convergence for any 
fixed value of t G [0,T]. 

Lemma A. 4. Suppose that assumptions (A1)-(A5) hold. Then, there is a pos- 
itive constant £1 such that, for all t € [0, T], 



nE 



p (0 a (t)-m\\ 2 ) <Ci 



Proof. The demonstration is similar to the proof of Lemma |A.5| and is thus 
omitted. □ 



Lemma A. 5. Suppose that assumptions (A1)-(A5) hold. Then, there is a pos- 
itive constant £2 such that 



nE 



v (||3„(t) - m - P a (r) + P(r)\\ 2 ) < C2I* 



2ti 



Proof. A direct decomposition leads to 
n\\X(t)-W)-Pa(r)+Hr)f 



< 



(Ga 1 - G- 1 )! £ ^x fe (r fc (t) - n(r)) + G-i £ 
feet/ 



— + 2A 2jV , 



l)x fc (y fc (t)-n(r)) 
(22) 



where A 2 N = nUG" 1 - G^f i £ fc££/ ^x fc (y fc (i) - F fe (r)) 



and 



Al N = n\\G^f 



jj Yskeu a k*-k(Yk(t)-Y k (r)) . Using now assumptions (A2)- 
(A4) and the Cauchy-Schwarz inequality, we get 



fee (7 



A 2 iV 



V fcGC/ 



KnWG-'-G-'f^CsCS-rl^. 
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Using now Lemma |A.l[ we can bound 

K(A 2 N ) <C\t~r\ 2 ?, (23) 

for some constant C. Now, with assumptions (A1)-(A5) and following the same 
arguments as in the proof of Lemma A. 2 we also have 



E P (A 2 2N ) < n\\G- 



< 



n_ 1 nmax fc ^;gj7 |A fc ; 
NX 



C 3 C 4 a- 2 \t - r\ 2fs < C\t - r| 2/3 . (24) 



A 



for some positive constant C. Combining (22 1, ( 23 1 and ( 24 ) . the result is proved. 

□ 



A . 2. Proof of Proposition \3.1\ and Proposition \3.2\ 



The proof of Proposition 3.1 is omitted. It is analogous to the proof of Propo- 
sition |3.2| which is given below. The different steps are similar to the proof of 



Proposition 1 in Cardot and Josserand (20111. 



Let us decompose, for t € [0,T], 

sup |/2 M A,dC0 -/■»(*) | < SU P -#MA,o(*)l + SU P |MMA,o(*) 

te[o,T] te[o,T] te[o,T] 

(25) 

and study each term at the right-hand side of the inequality separately. 



Step 1. The interpolation error sup tS [ T j \pMA,d(t) 
Consider t € [U,ti + i) and write 
|Mma,<j(*) (f)l < I (**) (*)l + l (tf+l) - MMA,o(*i)|- 



Under assumptions (A2)-(A5) and using Lemma |A.3| ii), we get 

— 2^ afc x fe(Pa(*) - PaM) + 2^ " 



|MMA,o(*) - MMA,a(f)| < 



feet/ 



fees 



TTfe 



\ feet/ 

< ((1 + A" 1 )^" 1 + l) A- 1 v / C^|i-r|^ i . 
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So, there is a positive constant C such that 

|MMA,o(*) - MMA.aMI < C|* - r f 

and consequently, 

|/XMA,d(*) ~ /*MA,o(*)| < <?[|*i - tf + \U +1 - tif] 

<2C\t i+1 -t i f. 

Hence, since by hypothesis, lim^v-s-oo max i={i,....d N - 1 } l^i+i™^!' 3 = o(n -1 / 2 ), we 
have 



SUp \/n|/iMA,d(*) - MMA,a(*)| = o(l). (26) 

fe[o,T] 

5<ep ,2. TTie estimation error sup 4g j T ] |/iMA,a(i) — I- 
We use the following decomposition: 

Sup |jUMA,a(*) - ^ |i"MA,a(0) - //(0)| +SUp r te r T i|/tMA,a(*) - - MMA,a(0 

te[0,T] 



(27) 



Writing, 

MMA,« (0) - m(0) = ~ E ^^(°) - 4 E 



N t—' N 

keu keu 



7T, 



we directly get, with hypotheses A1-A3 and with similar arguments as in the 
proof of Lemma |A.2| that for some constant C, 

E p (£ M A,a(0)-M0)) 2 < ^. (28) 



The second term at the right-hand side in (27 1 is dealt with using maximal 
inequalities. More exactly, we use Corollary 2.2.5 in |van der Vaart and Wellner| 
(2000 ). Consider for this, the Orlicz norm of some random variable X which is 
defined as follows 



\x\y = ^E^(xJ). 



For the particular case ip(u) = u 2 , the Orlicz norm is simply the well-known L 2 
norm, ||X||^ = y/E p (X 2 ). Let us introduce for (r,t) e [0,T] 2 , the semimetric 
d(r, t) defined by 

d 2 (r,t) = ||\/n|iUMA,a(*) - M(*) - MMA,aW - mMI)^ 
= nE p (|//MA,o(*) - K*) - MMA,aW + M r )| 2 ) 
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and consider D(e,d), the packing number, which is defined as the maximum 
number of points in [0, T) whose distance between each pair is strictly larger 
than e. Then, Corollary 2.2.5 in van der Vaart and Wellner (2000) states that 
there is a constant K > such that 

sup s/n\jj,MA,a(t) - _ V>MA,a( r ) - K r )\ 

(r,t)G[0,T] 2 



< K I ^{D^d^de. 

(29) 



We show below that there is a constant C such that d 2 {r,t) < C\t — r\ 2 " and 
thus, since j3 > 1/2, the integral at the right-hand side of (29 1 is finite. 
Let us first decompose 



where 



and 



d 2 (r,t) < 2dl(r,t)+2dl(r,t) 



d\(r, t) = 7lEp(|jLtMA,a(*) - _ %&A,a{r) + K r )?) 



d 2 2 (r,t) = nE p (\fi(t) - fjL(t) - £(r) + M (r)| 2 ). 



(30) 



C\ 



By assumptions (A2)-(A4) and Lemma A.5| we can bound, for some constant 



df{r,t) < E p I n 



N ^ 



keu 



\\P a (t)-p(t)-0a(r)+l3(r)\\ 



< C 4 C 2 \t-r\^:=C\t-r\^ 



Considering now d2(r,t), we have 



d\ (r, t) — nKp 



keu 

< 2E p (A 2 N ) + 2E p {B 2 N ) 



(31) 



(32) 



where A 2 N = n (-L J^keu a k [**(*) ~ Y k {r)]) 2 and B 2 N = n ^-L J^keu a k*' k {P{t) ' 
With hypotheses (A1)-(A3), one can easily obtain that there is a positive con- 
stant C such that 



E p (A 2 N ) <C\t~r\ 2 ? 



(33) 
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and thanks to Lemma |A.2| and to Lemma |A.3[ we can bound 



E p (B 2 N ) < E p 



1 E 



keu 



\\P(t)-Hr)\\ 



< C\t-r\ 



2/3 



Combining now (33 1 and (34 1 with (30 1 and (31), we get that 

d 2 {r,t) < C\t-r\ 2 \ 

for some constant C. 



(34) 



(35) 



Using now ( 35 1 , it is clear that the packing number is bounded as foll ows : 
D(e,d) = 0(e^P). Consequently, the integral at the right-hand side of (29 1 



is finite when /3 > 1/2. Inserting (28 1 and (29 1 in (27 1, the proof of step 2 is 
complete. 

A. 3. Proof of the consistency of the covariance function 

We first prove that for any (r, t) £ [0,T] 2 , the estimator jma. d(r,t) of the 
covariance function converges to 'Jma{t, t). 

Then we prove the uniform convergence of the variance estimator 7ma )( z(A t) 
by showing its convergence in distribution to zero in the space of continuous 
functions. The proof is decomposed into two classical steps (see for example 
Theorem 8.1 in |Billingsley | ( |1968| ) . We first show the pointwise convergence, by 
considering the convergence of all finite linear combinations, and then we check 
that the sequence is tight by bounding the increments. 

Step 1. Pointwise convergence 

We want to show, that for each (t, r) 6 [0, T] 2 , we have when N tends to infinity, 
nE p {\ %LA,d(r,t) - tma (r,i) |} -> 0. 

Let us decompose 

n(%iA.,d(r, t) - 7ma(7 , *)) = n(juA,d(r, t) - 7MA : a(V, t)) + n(7MA,a(V, t) - 7ma(»', *)) 
where 7MA,a(^ t) is defined by 

^ 1 ^ A kl Y k (r)-Y k>a (r) Y t (t) - Y l>a (t) 
7MA,o(r-,^ = ^ > ^ 

N kits nkl 7Ffe 

We study separately the interpolation and the estimation errors. 
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Interpolation error 

Let us suppose that t € [£j,tj+i), r e [ii',tj'+i). We have n(7MA,d('"> i) 
7MA,a(r,t)) < A + B, with 



E 



kl\ 



N 2 * — ' 7ri,;7Tt,7r; 

k,ieu 



\(Y k , d (r) -Yk(r))(Y l4 (t) -Y t (t)) 



+ (n,d(r) - r fc (r))(y,(t) - + (n(r) - K fc)< ,(r))(Ki l(J (t) - Y t (t))\ 



and 



B = 



n \ -v 



n 

iV2 



E 



JA_«| 
I A w | 



y fc (r)(yj,„(t) - y M (t)) + y (t)(n,a(r) - y M (r)) + n,d(r)y M (t) - F fc) „(r)y, j0 (t) 

Trwvrfcvr, 

For i £ [ti, ti + i], we can write 

\Yi, d (t) - y,(t)i < |y,(*<) - y(*)i + iy(*i+i) - y(**)i 

and 

!?!,«(*) - %d{t)\ < |$j,„(f) - % a (U)\ + \% a {t i+l ) - % d (ti)\ 
We have that ^Ei eU ( Y iAt)-Yi(t)) 2 < C[\U-t\ 2 ? +\t l+1 -U\ 2 f 3 ] and £ £ ieC/ (y (*) 



y,d(i)) 2 = 0(1). Thanks to Lemma A.3 



we can bound 



Under the assumption on the grid of discretization points, one can get after 
some algebra that 

n \lMA,d( r it) ~ 7MA,a( r ">£)| = o(l). 



1 



Estimation error 
Consider now, 

™(7MA,a(M) -7MA(n*)) 



EE A - f-*'- : ^ ] v- /( ,)-y,ir) 



N2 7rfc7r; ^ 7rw 



TV 2 



TV 2 
n 

iV 2 



EE ^- ■y/..„</>i[ye> -y,<ni 



fcec/ /ea 



fcGC/ZGE/ 

Ai(r,t) + A 2 (r,t) + A 3 (r,t) + A 4 (r,t). 



(36) 
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Let us define e k (t) = Y k (t) — Y k (t) and first show that E p (A 1 (r, t) 2 ) — > when 
A -> oo. 



E^A^t) 2 ) = E P 



E„ 



2E r , 



2 



" J A fc ; ^ Ifcj A;.'/ 



A 4 



k,ieu k'Meu 

1 1-TTfc /'II/ 



A''/' 



n 

jV 4 



+ E, 



??■ 

A 4 



EE 

fceC fc'G(7 

E E 
E E 



- 1 



1 - TTfc 



1 e k {t)ei(r)e k '(t)ei>(r) 



kGU k'jtl'eU 

2 A 



- 1 



TTfc'T \^k'l 



- lj e k (t)e k {r)e kl {t)ey(r) 
- 1 ) h(t)e k (r)e k '(t)ei>(r) 



kjtieu k'jU'eu 
ai + a 2 + a 3 . 



\ A^ / W _ x g fc(t) g l(r) g v(t)gjf(r) 



(37) 



The hypotheses on the moments of the inclusion probabilities and Lemma |A.6| 
give us 



ai < 



W 4 A 2 



max k -£ k i eU \A kk i 



A 4 



(4 



as well as 



a 3 < 



C (jimax^ let/ |A fcJ |) 5 



A 



A 4 A=t 



max |E p {(lfcz - nki)(lk'V ~ w)}lCs 



so that ai — > and a 3 — > when A — > oo. Then, the Cauchy-Schwarz inequality 
allows us to get that a 2 —> when A —> oo and E p (Ai(r, t) 2 ) — > when A — > oo. 

Let us show now that E p (|Ai(r, — > when A —> oo. With Lemma 
and assumptions (A1)-(A5), we have 



A.4 



E p (\A 4 (r,t)\) <nE p l ^EE 



A M | 1 



A 2 * — ' * — ' TT k TTl TTkt 

keu leu 



||x fc ||||x l ||||3(t)-i8 a (t)|||| J 3(r)-i9 (r)|| 



< 



1 



A 2 A A 2 A* 

so that E p (\Ai{r,t)\) -> when A — > oo. 
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In a similar way, we can bound E p (|A 2 (r, t)\) as follows, 
E p (\A 2 (r,t)\) < ^ £ E — — E p\^(t)%(r)\ 

<^E E — — KM - Y k (t)\ ■ E p (\\f3(r) - 3 B (r)||) 



v A 2 iV A 2 A* ) N 

where efc(i) = Yk(t) — Yk, a (t) = x' k (j3(t) — f3 a {t)). Thus, there is constant C such 
that, 

E p (\A 2 (r,t)\) < -i 
v n 

and Ep(|A2(r, — > when A*" — > oo. We can show in a similar way that 
Ep(\A 3 (r,t)\) -> when N -> oo. 

Finally, we have that for all (r.t) £ [0,T] 2 , 

nE p {\ 

7MA,a 

(r,t) - 7ma (r,t) | } ->• 0, when AT -> oo. (38) 



Step 2. Uniform convergence of the variance estimator 

The pointwise convergence of the variance function proved in the previous step 
clearly implies the convergence of all finite linear combinations : for all p € 
{1, 2, . . .}, for all (d, . . . , Cp) € M p and for all (i lf . . . , i p ) € [0, Tp, we have 

p 

E c ^ 71 (7ma,o(^,*«) - Tma -> (39) 

in probability as iV tends to infinity. Thus, we deduce with the Cramer- Wold de- 
vice that the vector n (tma,o(*i>*i) - 7ma(*i,*i), • • • , 7MA,a(£ P , t p ) - 7 M A(i P , t p )) 
converges in distribution to (in W). 

We need now to prove that the sequence of random functions 7ma,o {t, t) 
is tight in C[0,T] by using a bound on its increments. Let us introduce the 
following criterion, 

d 2 (t,r) = n 2 E p (|7MA,a(i,t) - 7ma {t, t) - 7 M A,a(f, r) + Jua(t, r)\ 2 ). 

To conclude we show in the following that d 2 (t, r) < C\t — r\ 2 ° for a constant C 
and all (r,t) € [0,T] 2 . Using (36 1, the distance is decomposed into four parts. 



Let us define (f>ki[t,r) — e~k{t)e~i(t) — ek{r)ei{r) and first consider d 2 Ai 
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EpflAi (i,t) - Ai(r,r)| 2 ). We have 

1 - TTfc 



d\ = E p 



2E„ 



,2 



1 



+ E P 



rv 
~N A 

o 

rr 
TV 4 



E E 

feet/ k'^i'eu 

E E 



1 - 7Tfc /Ife 



bi+b 2 + b 3 



TTfc'TT;' \7Tfc'j 



1 ) <Pkk(t,r)4>k'k'{t,r) 
l \ i>kk{t,r)^ k 'i'(t,r) 

1^ <f>ki(t,r)<f)k'i'{t,r) 
(40) 



Thanks to Lemma [A. 8 1 we get 



&i < 



n 2 1 n 2 maxfc^fc/gc/ |A fcfc /|\ 1 



TV 3 A 3 TV 2 
<C|t-r| 2 ' 3 



A 4 



N 



r)| s 



keu 



(41) 



and 



C, 



\20 



N 

<C\t-r\ 2 ' 3 . 



(nmax^jgt; |A M |) 2 
A 4 A* 2 



max |E„{(lfcj - 7Tfc{)(lfc'j' - 7Tfc'/')}l 

(42) 



The Cauchy-Schwarz inequality together with bounds (41 1 and (42 1 allows 
us to get b 2 < C\t - r\ 2/3 so that 



d\ <C\t-r\ 



20 



(43) 



Let us bound now d\ 2 = E p (\A 2 (t,t) - A 2 (r,r)\ 2 ) and define <p k i(t, 

we get 



&k{t)ei(t) — efe(r)e;(r). Thanks to Lemma A. 9 

2 

2n 2 (I ~ , A 

M 2 - Ar2A 4 



V feet/ / ' \ fe,ie!7 

<C*|i-r| 2/5 . (44) 

Let us study now the last term, d\ — E p (|Ai(i, t) — A±(r, r)\ 2 ) and define 



<p kl {t,r) = ek(t)ei(t) — efe(r)e/(r). Thanks to Lemma A. 7 
/ \ 2 



we have 



di. < 



2n 2 



E, 



1 



Ai ~ iV 2 A 4 p \ N 



E 0**(V' 



fcef/ 



2n 2 maxfc^jgt/ |Aj 
A 4 A* 2 



-E 



" I TV 2 



E i<M*> 



fc,iec/ 



< C|t-r| 



2ri 



(45) 
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Finally, we can deduce, with inequalities (36 1, (43 1, (44) and ( |45| , that 

d*{t,r) = n 2 E p (|7MA,a(M) - 7ma(M) - 7UA,a(r,r) + 7ma(?", r)| 2 ) 

<C\t-r\ 2p . (46) 

The end of the proof is a direct application of Theorem 12.3 of Billingsley 
( |1968[ ). Since (3 > 1/2, the sequence ti(7ma, (M)-7ma(M)) is tight in C([0,T]) 
and converges in distribution to 0. The proof is complete with a direct applica- 
tion of the definition of weak convergence in C([0,T]) considering the bounded 
and continuous "sup" functional. □ 



A. 4- Proofs related to the asymptotic normality and the confidence 
bands 



The steps of the proof of Proposition 3.4 are similar to the steps of the proof 
of Proposition |3.3[ We first examine the finite combinations and invoke the 
Cramer- Wold device. Then we prove the tightness thanks to inequalities on the 
increments. 

Let us first deal with the interpolation error, which is negligible under the 
assumption on the grid of discretization points, as shown in (26). 

Then, in light of ( 10 1, Lemma A. 2 and Lemma A. 4 we clearly have that, for 
each value of t, 

\fn (/2ma,o (*) - = °p(X)i 
and consequently, as n tends to infinity, 

\/n (p>MA,a(t) — A/"(0, 7z(t, t)) in distribution, 

where the covariance-function of Jl, which defined in ( 12 1, satisfies limjv->oo "7ma : 

If we now consider p distinct discretization instants < t\ < t<x . . . < t p < 1, it 
is immediate to check that for any vector c e MP, y/n (^J2^ =1 Cj(p(tj) — /u(ij))J — > 
A/"(0,ct 2 ) where 

p v 

3=1 1=1 

Indeed, by linearity, there exists a vector of random weights (ioi, . . . , wn) which 
does not depend on time t such that 



and Ysj—i Cj(i(tj) — ^2 keU tffe fS^=i c j'^fcfe)) a is° satisfies a CLT, with asymp- 
totic variance ct 2 , under the moment conditions (A7). Thus, any finite linear 
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combination is asymptotically Gaussian and we can conclude that the vec- 
tor t/ti (jj,(ti) — (J,(t%), . . . ,Jt(t p ) — /J-(t p )) is asymptotically Gaussian with the 
Cramer- Wold device. 

It remains to check the tightness of the functional process and this is a direct 
consequence of (30 I and (35 I. Indeed, denoting by Z n (t) = ^/n (~pMA,a(t) — ; 
there is a constant C such that, for all (r,t) £ [0,T] 2 , 

E p (\Z n (t) - Z„(r)] 2 ) <C\t-rf , 

and, since f3 > 1/2, the sequence Z n is tight in C[0, T], in view of Theorem 12.3 
of |Billingsley1 ( [T968| . 

□ 



We prove now Proposition |3.5| the last result of the paper. The proof consists 
in showing the weak convergence of the sequence of distributions (Z^) to the 
law of Z in C([0,T]). 

For any vector of p points < t\ < . . . < t p < T, the finite dimensional 
convergence of the distribution of the Gaussian vector (Zjy(ti), . . . , Zjy(t p )) to 
the distribution of (Z(ti), . . . ,Z(t p )) is an immediate consequence of the uni- 
form convergence of the covariance function stated in Proposition |3.3| We can 
conclude with Slutsky's Lemma noting that for any (ci, . . . , c p ) £ W, 



p p 



p p 



X X c i c ^7MA,d(^, h) -> ^2 ^2 CjCg~fuA(tj,k) in probability. (47) 

j=l 1=1 j = l 1=1 

Now, we need to check the tightness of (Zn) in C([0,T]). Given 7MA,d, we 
have for (r,t) £ [0,T] 2 , 



E, 



(z N (t) 



Z N(r, 



7M A, d 



n (7MA,d(^, t) - 2jMA,d(r, t) + 7MA,d(^ r)) 



and after some algebra, we obtain thanks to Assumption (A2) that 



Z N (t) - Z N (r) ) |7MA,d 



< 



C 
N 



- Y Kd(r)f + (n,d(t) - Y k>d (r 
(48) 



keu 



Let us first study the term ^2 keU (Y k ,d(t) — Y kt d(r)) 2 in the previous inequal- 
ity and without loss of generality suppose that t > r. To check the continuity 
of the trajctories, we only need to consider points r and t that are close to each 
other. If t and r belong to the same interval, say ij+i], then it is easy to 
check, with Assumption (A4) that 



^ £ (Y k , d (t) ~ Y k . d (r)f = {t "I ± X Cn(*i+i) - Y k (U)) 
keu ^ 1+1 l > keu 



(t~r) 2 
<C(t-r) 2 ' 3 . 



(49) 
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If we suppose now that r € [tj_i,tj] and t £ then we have 



\Y k , d {t)-Y k>d {r)\ ^ f\Y k (t i+1 )-Y k (U)\ \Y k (t t ) - Y k (ti-i)\ 
— < max 



t - r 



H+l 



ti ti — 1 



< \Y k (t i+1 )-Y k (ti)\ \Y k (U) - F fc (ti_i)| 



H+l 



ti ti — 1 



and using the same decomposition as in ( 49 1 , we directly get that 



^2(Y k , d (t)-Y k>d (r)f <C(t-r) 



2:) 



keu 



The second term at the right-hand side of inequality ( 48 1 is dealt with similar 



arguments and the decomposition used in the proof of Lemma |A.3| so that 

W £ fee£ / (%,*(*) ~ Y„Ar)) 2 <C\t- r-r . 

Thus, the trajectories of the Gaussian process are continuous on [0, T] when- 
ever p > (see e.g Theorem 1.4.1 in Adler and Taylor (2007)) and the sequence 
(Zn) converges weakly to Z in C([0, T]) equipped with the supremum norm. 
Using again Proposition 3.3 we have, uniformly in t, &z(t) = <7z(t) + o p (l), 



where o 2 z {t) = ri7MA,d(i> t). Since, by hypothesis c|(t) = jz(t,t) is a continu- 
ous function and inft7z(t,f) > 0, we get with Slutsky's lemma that (Zn/oz) 
converges weakly to Zjoz in C([0, T]). By definition of the weak convergence 
in C([0, T]) and the continuous mapping theorem, we also deduce that the real 
random variable = sup te [ 0T j \Z]y(t)\/az{t) converges in distribution to 

M = sup te [ T i \Z(t)\/az(t), so that for each c > 0, 



sup \Z N {t)\/d z {t) < c 

v*e[0,T] ; 



sup \Z(t)\/a z (t) < c 
K te[o,T] , 



Note finally, that under the previous hypotheses on 72 (see e.g. Pitt and 



Tran 



(1979 1), the real random variable M = sup te [ 0T ] (\Z(t)\/az(t)) has an 
absolutely continuous and bounded density function so that the convergence 
holds uniformly in c (see e.g. Lemma 2.11 in van der Vaart (|1998 1). □ 



A. 5. Some useful lemmas 



We state here without any proof some results that are needed for the study of the 
convergence of the covariance function. They rely on applications of the Cauchy- 
Schwarz inequality and on the assumptions on the moments of the trajectories 
and the inclusion probabilities. 

Lemma A. 6. Assume (A2)-(A5) and (A7) hold. There are two constants £4 
and (5 such that 

~ E e k {tfe k {rf < U 
keu 
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and 

keu leu 

where e k (t) = Y k (t) - %{t). 

Lemma A. 7. Assume (A2)-(A5) and (A7) hold. There are two constants 
and £7 such that 



M^E?^) 2 J < Celt -if 
V keu ) 

and 

E p (^2 E ^>ki(t,r)\ <( 7 [\t-r 



20 



kdeu 



where <j> kl {t,r) = e k (t)ei(t) - e fe (r)e ; (r) and e fe (i) = %{t) - Y k , a (t). 

Lemma A. 8. Assume (A2)-(A5) and (A7) hold. There are two constant con- 
stants ( s and £9 such that 



^E^' r )^8it-< 



1 2/3 

TV 

feet/ 



and 

\ kdeu J 



where (p k i(t,r) = e k {t)e~i{t) - e k (r)e~i{r) and e k (t) = Y k (t) - Y k (t). 

Lemma A. 9. Assume (A2)-(A5) and (A7) hold. There are two constants £io 
and C11 such that 

E„ f 4, Tk.k(t,r? \ <C w \t-r\ 2 ^ 



and 



^^^ fc (t,r) 2 ) < C10I* 
keu / 

M 4 E <Cn|t-r| 2 ^ 
\ Met/ / 



where 4> k i(t,r) = e k (t)ei(t) - e fe (r)e ; (r), e fe (i) = Yfe(t) - F fe (i) and e fc (i) 

n(*)-n,a(*)- 
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